Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits

نویسندگان

  • Jouni Pohjalainen
  • Okko Johannes Räsänen
  • Serdar Kadioglu
چکیده

This study focuses on feature selection in paralinguistic analysis and presents recently developed supervised and unsupervised methods for feature subset selection and feature ranking. Using the standard k-nearest-neighbors (kNN) rule as the classification algorithm, the feature selection methods are evaluated individually and in different combinations in seven paralinguistic speaker trait classification tasks. In each analyzed data set, the overall number of features highly exceeds the number of data points available for training and evaluation, making a well-generalizing feature selection process extremely difficult. The performance of feature sets on the feature selection data is observed to be a poor indicator of their performance on unseen data. The studied feature selection methods clearly outperform a standard greedy hill-climbing selection algorithm by being more robust against overfitting. When the selection methods are suitably combined with each other, the performance in the classification task can be further improved. In general, it is shown that the use of automatic feature selection in paralinguistic analysis can be used to reduce the overall number of features to a fraction of the original feature set size while still achieving a comparable or even better performance than baseline support vector machine or random forest classifiers using the full feature set. The most typically selected features for recognition of speaker likability, intelligibility and five personality traits are also reported.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pitch and Intonation Contribution to Speakers' Traits Classification

The article describes the system we submitted for the three sub-challenges of INTERSPEECH 2012 Speaker Trait Challenge for the classification of the five personality traits of OCEAN, likability and intelligibility. The system was based on a two-class SVM-classifier using leave-one-speaker-out cross-validation to optimize SVM complexity parameter and to select the feature set for trait classific...

متن کامل

The INTERSPEECH 2012 Speaker Trait Challenge

The INTERSPEECH 2012 Speaker Trait Challenge provides for the first time a unified test-bed for ‘perceived’ speaker traits: Personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In this paper, we describe these three Sub-Challenges, Challenge conditions, baselines, and a new feature set by the openSMILE toolkit, provided to the...

متن کامل

A Survey on perceived speaker traits: Personality, likability, pathology, and the first challenge

The INTERSPEECH 2012 Speaker Trait Challenge aimed at a unified test-bed for perceived speaker traits – the first challenge of this kind: personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In the present article, we give a brief overview of the state-of-the-art in these three fields of research and describe the three sub-cha...

متن کامل

Genetic Algorithm Based Feature Selection for Speaker Trait Classification

Personality, likability, and pathology are important speaker traits that convey rich information beyond the actual language. They have promising applications in human-machine interaction, health informatics, and surveillance. However, they are less researched than other paralinguistics phenomena such as emotion, age and gender. In this paper we propose a novel feature selection approach for spe...

متن کامل

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 29  شماره 

صفحات  -

تاریخ انتشار 2015